Nonparametric Density Estimation: Toward Computational Tractability

نویسندگان

  • Alexander G. Gray
  • Andrew W. Moore
چکیده

Density estimation is a core operation of virtually all probabilistic learning methods (as opposed to discriminative methods). Approaches to density estimation can be divided into two principal classes, parametric methods, such as Bayesian networks, and nonparametric methods such as kernel density estimation and smoothing splines. While neither choice should be universally preferred for all situations, a well-known benefit of nonparametric methods is their ability to achieve estimation optimality for ANY input distribution as more data are observed, a property that no model with a parametric assumption can have, and one of great importance in exploratory data analysis and mining where the underlying distribution is decidedly unknown. To date, however, despite a wealth of advanced underlying statistical theory, the use of nonparametric methods has been limited by their computational intractibility for all but the smallest datasets. In this paper, we present an algorithm for kernel density estimation, the chief nonparametric approach, which is dramatically faster than previous algorithmic approaches in terms of both dataset size and dimensionality. Furthermore, the algorithm provides arbitrarily tight accuracy guarantees, provides anytime convergence, works for all common kernel choices, and requires no parameter tuning. The algorithm is an instance of a new principle of algorithm design: multi-recursion, or higher-order divide-and-conquer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm

This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...

متن کامل

Efficient Nonparametric Density Estimation on the Sphere with Applications in Fluid Mechanics

The application of nonparametric probability density function estimation for the purpose of data analysis is well established. More recently, such methods have been applied to fluid flow calculations since the density of the fluid plays a crucial role in determining the flow. Furthermore, when the calculations involve directional or axial data, the domain of interest falls on the surface of the...

متن کامل

A sampling algorithm for bandwidth estimation in a nonparametric regression model with a flexible error density

We propose to approximate the unknown error density of a nonparametric regression model by a mixture of Gaussian densities with means being the individual error realizations and variance a constant parameter. This mixture density has the form of a kernel density estimator of error realizations. We derive an approximate likelihood and posterior for bandwidth parameters in the kernel–form error d...

متن کامل

Bayesian bandwidth estimation for a nonparametric functional regression model with unknown error density

Error density estimation in a nonparametric functional regression model with functional predictor and scalar response is considered. The unknown error density is approximated by a mixture of Gaussian densities with means being the individual residuals, and variance as a constant parameter. This proposed mixture error density has a form of a kernel density estimator of residuals, where the regre...

متن کامل

Fast Estimation of Nonparametric Kernel Density Through PDDP, and its Application in Texture Synthesis

In thiswork, anewalgorithm isproposed for fast estimationofnonparametricmultivariate kernel density, based on principal direction divisive partitioning (PDDP) of the data space.The goal of the proposed algorithm is to use the finite support property of kernels for fast estimation of density. Compared to earlier approaches, this work explains the need of using boundaries (for partitioning the sp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003